Analyzing large data sets: rbcL 500 revisited.
نویسندگان
چکیده
In 1993, Mark Chase and 41 coauthors published phylogenetic analyses of two very large data sets of nucleotide sequences of the chloroplast gene rbcL, which encodes the large subunit of ribulose 1,5-bisphosphate carboxylase. Their paper was important for several reasons. These analyses were (and still are) among the largest ever attempted using parsimony. The assembly of such a large number of sequences clearly demonstrated a high level of cooperation on the part of the botanical systematics community. Furthermore, a number of important new hypotheses regarding seed plant phylogeny emerged from this study, and it has helped to orient many subsequent phylogenetic analyses. Increasingly, the Chase et al. trees are being used in quantitative comparative analyses (e.g., Barraclough et al., 1996; also see Donoghue and Ackerly, 1996, and associated papers). We reanalyzed one of the Chase et al. data sets for two reasons. First, we wanted to explore the general methodological and theoretical issues raised by very large data sets. It is critical that these issues be addressed now because the number of large data sets is increasing rapidly. Second, in view of its importance, we wanted to discover the effects of long search times and alternative search strategies on this data
منابع مشابه
Application of Benford’s Law in Analyzing Geotechnical Data
Benford’s law predicts the frequency of the first digit of numbers met in a wide range of naturally occurring phenomena. In data sets, following Benford’s law, numbers are started with a small leading digit more often than those with a large leading digit. This law can be used as a tool for detecting fraud and abnormally in the number sets and any fabricated number sets. This can be used as an ...
متن کاملNoncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms.
We developed PCR primers against highly conserved regions of the rRNA operon located within the inverted repeat of the chloroplast genome and used these to amplify the region spanning from the 3' terminus of the 23S rRNA gene to the 5' terminus of the 5S rRNA gene. The sequence of this roughly 500-bp region, which includes the 4.5S rRNA gene and two chloroplast intergenic transcribed spacer reg...
متن کاملAnalyzing Multi-locus Plant Barcoding Datasets with a Composition Vector Method Based on Adjustable Weighted Distance
BACKGROUND The composition vector (CV) method has been proved to be a reliable and fast alignment-free method to analyze large COI barcoding data. In this study, we modify this method for analyzing multi-gene datasets for plant DNA barcoding. The modified method includes an adjustable-weighted algorithm for the vector distance according to the ratio in sequence length of the candidate genes for...
متن کاملInferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms.
To explore the feasibility of parsimony analysis for large data sets, we conducted heuristic parsimony searches and bootstrap analyses on separate and combined DNA data sets for 190 angiosperms and three outgroups. Separate data sets of 18S rDNA (1,855 bp), rbcL (1,428 bp), and atpB (1,450 bp) sequences were combined into a single matrix 4,733 bp in length. Analyses of the combined data set sho...
متن کاملSweep Line Algorithm for Convex Hull Revisited
Convex hull of some given points is the intersection of all convex sets containing them. It is used as primary structure in many other problems in computational geometry and other areas like image processing, model identification, geographical data systems, and triangular computation of a set of points and so on. Computing the convex hull of a set of point is one of the most fundamental and imp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systematic biology
دوره 46 3 شماره
صفحات -
تاریخ انتشار 1997